Leveraging Dependency Regularization for Event Extraction
نویسندگان
چکیده
Event Extraction (EE) is a challenging Information Extraction task which aims to discover event triggers with specific types and their arguments. Most recent research on Event Extraction relies on pattern-based or feature-based approaches, trained on annotated corpora, to recognize combinations of event triggers, arguments, and other contextual information. These combinations may each appear in a variety of linguistic forms. Not all of these event expressions will have appeared in the training data, thus adversely affecting EE performance. In this paper, we demonstrate the overall effectiveness of Dependency Regularization techniques to generalize the patterns extracted from the training data to boost EE performance. We present experimental results on the ACE 2005 corpus, showing improvement over the baseline system, and consider the impact of the individual regularization rules. Introduction Event Extraction (EE) involves identifying instances of specified types of events and the corresponding arguments in text, which is an important but difficult Information Extraction (IE) task. Associated with each event mention is a phrase, the event trigger (most often a single verb or nominalization), which evokes that event. More precisely, our task involves identifying event triggers associated with corresponding arguments and classifying them into specific event types. For instance, according to the ACE 2005 annotation guidelines1, in the sentence “[She] was killed by [an automobile] [yesterday]”, an event extraction system should be able to recognize the word “killed” as a trigger for the event DIE, and discover “an automobile” and “yesterday” as the Agent and Time Arguments. This task is quite challenging, as the same event might appear in the form of various trigger expressions and an expression might represent different events in different contexts. Most recent research on Automatic Content Extraction (ACE) Event Extraction relies on pattern-based or featurebased approaches to building classifiers for event trigger and argument labeling. Although the training corpus is quite Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. https://www.ldc.upenn.edu/ sites/www.ldc.upenn.edu/files/ english-events-guidelines-v5.4.3.pdf large (300,000 words), the test data will inevitably contain some event expressions that never occur in the training data. To address this problem, we propose several Dependency Regularization methods to help generalize the syntactic patterns extracted from the training data in order to boost EE performance. Among the syntactic representations, dependency relations serve as important features or part of a pattern-based framework in IE systems, and play a significant role in IE approaches. These proposed regularization rules will be applied either to the dependency parse outputs of the candidate sentences or to the patterns themselves to facilitate detecting the event instances. The experimental results demonstrate that our pattern-based system with the expanded patterns can achieve substantial improvement over the baseline, which is an advance over the state-of-the-art systems. The paper is organized as follows: we first describe the role of dependency analysis in event extraction and how dependency regularization methods can improve EE performance. In the sections which follow, we describe our EE systems including the baseline and enhanced system utilizing dependency regularization, we present experimental results, and we discuss related work. Dependency Regularization The ACE 2005 Event Guidelines specify a set of 33 types of events, and these have been widely used for research on event extraction over the past decade. Some trigger words are unambiguous indicators of particular types of events. For example, the word murder indicates an event of type DIE. However, most words have multiple senses and so may be associated with multiple types of events. Many of these cases can be disambiguated based on the semantic types of the trigger arguments: • fire can be either an ATTACK event (“fire a weapon”) or END-POSITION event (“fire a person”), with the cases distinguishable by the semantic type of the direct object. discharge has the same ambiguity and the same disambiguation rule. • leave can be either a TRANSPORT event (“he left the building”) or an END-POSITION event (“he left the administration”), again generally distinguishable by the type of the direct object. Given a training corpus annotated with triggers and event arguments we can assemble a set of frames and link them to particular event types. Each frame will record the event arguments and their syntactic (dependency) relation to the trigger. When decoding new text, we will parse it with a dependency parser, look for a matching frame, and tag the trigger candidate with the corresponding event type. One complication is that the frames may be embedded in different syntactic structures: verbal and nominal forms, relative clauses, active and passive voice, etc. Because of the limited size of the training corpus, some triggers will appear with frames not seen in the training corpus. To fill these gaps, we will employ a set of dependency regularization rules which transform the syntactic structure of the input to reduce variation. We describe here three of the regularization rules we use: 1. Verb Chain regularization 2. Passive Voice Regularization 3. Relative Clause Regularization Verb Chain Regularization We use a fast dependency parser (Tratz and Hovy 2011) that analyzes multi-word verb groups (with auxiliaries) into chains with the first word at the head of the chain. Verb Chain (vch) Regularization reverses the verb chains to place the main (final) verb at the top of the dependency parse tree. This reduces the variation in the dependency paths from trigger to arguments due to differences in tense, aspect, and modality. Here is an example sentence containing a verb chain: Kobe has defeated Michael . (1)
منابع مشابه
Improving Event Detection with Dependency Regularization
Event Detection (ED) is an Information Extraction task which involves identifying instances of specified types of events in text. Most recent research on Event Detection relies on pattern-based or featurebased approaches, trained on annotated corpora, to recognize combinations of event triggers, arguments, and other contextual information. These combinations may each appear in a variety of ling...
متن کاملLeveraging Multilingual Training for Limited Resource Event Extraction
Event extraction has become one of the most important topics in information extraction, but to date, there is very limited work on leveraging cross-lingual training to boost performance. We propose a new event extraction approach that trains on multiple languages using a combination of both language-dependent and language-independent features, with particular focus on the case where target doma...
متن کاملEvaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks
In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers — either directly...
متن کاملRBPB: Regularization-Based Pattern Balancing Method for Event Extraction
Event extraction is a particularly challenging information extraction task, which intends to identify and classify event triggers and arguments from raw text. In recent works, when determining event types (trigger classification), most of the works are either pattern-only or feature-only. However, although patterns cannot cover all representations of an event, it is still a very important featu...
متن کاملAn extended dependency graph for relation extraction in biomedical texts
Kernel-based methods are widely used for relation extraction task and obtain good results by leveraging lexical and syntactic information. However, in biomedical domain these methods are limited by the size of dataset and have difficulty in coping with variations in text. To address this problem, we propose Extended Dependency Graph (EDG) by incorporating a few simple linguistic ideas and inclu...
متن کامل